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^^ Abstract. As inductive inference and machine learning methods in computer 

^^ science see continued success, researchers are aiming to describe even more com- 

O plex probabihstic models and inference algorithms. What are the limits of mech- 

Qj anizing probabilistic inference? We investigate the computability of conditional 

I I probability, a fundamental notion in probability theory and a cornerstone of 

J Bayesian statistics, and show that there are computable joint distributions with 

I noncomputable conditional distributions, ruling out the prospect of general in- 
ference algorithms, even inefficient ones. Specifically, we construct a pair of com- 

I I putable random variables in the unit interval such that the conditional distribu- 

(~^ tion of the first variable given the second encodes the halting problem. Neverthe- 

1 less, probabilistic inference is possible in many common modeling settings, and 

• we prove several results giving broadly applicable conditions under which condi- 

' -~H tional distributions are computable. In particular, conditional distributions be- 

C^ come computable when measurements are corrupted by independent computable 

C noise with a sufficiently smooth density. 
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1. Introduction 



The use of probability to reason about uncertainty is key to modern science and 
engineering, and the operation of conditioning, used to perform Bayesian induc- 
^D five reasoning in probabihstic models, directly raises many of its most important 

^ ; computational problems. Faced with probabilistic models of increasingly complex 

J;^ phenomena that stretch or exceed the limitations of existing representations and 

^^ algorithms, researchers have proposed new representations and formal languages 

,—1 for describing joint distributions on large collections of random variables, and have 

K^ developed new algorithms for performing automated probabilistic inference. What 

• '-j are the limits of this endeavor? Can we hope to automate probabilistic reasoning 

rN via a general inference algorithm that can compute conditional probabilities for an 

C^ arbitrary computable joint distribution? 

We demonstrate that there are computable joint distributions with noncom- 
putable conditional distributions. Of course, the fact that generic algorithms cannot 
exist for computing conditional probabilities does not rule out the possibility that 
large classes of distributions may be amenable to automated inference. The chal- 
lenge for mathematical theory is to explain the widespread success of probabilistic 
methods and characterize the circumstances when conditioning is possible. In this 
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vein, we describe broadly applicable conditions under which conditional probabilities 
are computable. 

1.1. Probabilistic programming. Within probabilistic Artificial Intelligence (AI) 
and machine learning, the study of formal languages and algorithms for describing 
and computing answers from probabilistic models is the subject of probabilistic pro- 
gramming. Probabilistic programming languages themselves build on modern pro- 
gramming languages and their facilities for recursion, abstraction, modularity, etc., 
to enable practitioners to define intricate, in some cases infinite-dimensional, mod- 
els by implementing a generative process that produces an exact sample from the 
model's joint distribution. (See, e.g., IBAL [PfeOl], Ao[PPT08], Church [GMR+08], 
and HANSEI [KS09]. For related and earlier efforts, see, e.g., PHA [Poo91], In- 
fer. NET [MWGKIO], Markov Logic [RD06]. Probabilistic programming languages 
have been the focus of a long tradition of research within programming languages, 
model checking and formal methods.) In many of these languages, one can easily 
represent the higher-order stochastic processes (e.g., distributions on data struc- 
tures, distributions on functions, and distributions on distributions) that are essen- 
tial building blocks in modern nonparametric Bayesian statistics. In fact, the most 
expressive such languages are each capable of describing the same robust class as 
the others — the class of computable distributions, which delineates those from which 
a probabilistic Turing machine can sample to arbitrary accuracy. 

Traditionally, inference algorithms for probabilistic models have been derived and 
implemented by hand. In contrast, probabilistic programming systems have intro- 
duced varying degrees of support for computing conditional distributions. Given the 
rate of progress toward broadening the scope of these algorithms, one might hope 
that there would eventually be a generic algorithm supporting the entire class of 
computable distributions. 

Despite recent progress towards a general such algorithm, support for conditioning 
with respect to continuous random variables has remained ad-hoc and incomplete. 
Our results explain why this is necessarily the case. 

1.2. Computable Distributions. In order to characterize the computational lim- 
its of probabilistic inference, we work within the framework of computable probability 
theory, which pertains to the computability of distributions, random variables, and 
probabilistic operations; and builds on the classical computability theory of deter- 
ministic functions. Just as the notion of a Turing machine allows one to prove 
results about discrete computations performed using an arbitrary (sufficiently pow- 
erful) programming language, the notion of a probabilistic Turing machine provides 
a basis for precisely describing the operations that probabilistic programming lan- 
guages are capable of performing. 

The tools for describing computability in this setting are drawn from the theory 
of computable metric spaces, within the subject of computable analysis. This theory 
gives us the ability to study distributions on arbitrary computable metric spaces, 
including, e.g., distributions on distributions. In Section 2 we present the necessary 
definitions and results from computable probability theory. 
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1.3. Conditional Probability. For an experiment with a discrete set of outcomes, 
computing conditional probabilities is, in principle, straightforward as it is simply a 
ratio of probabilities. However, in the case of conditioning on the value of a continu- 
ous random variable, this ratio is undefined. Furthermore, in modern Bayesian sta- 
tistics, and especially the probabilistic programming setting, it is common to place 
distributions on higher-order objects, and so one is already in a situation where el- 
ementary notions of conditional probability are insufficient and more sophisticated 
measure-theoretic notions are necessary. 

Kolmogorov [Kol33] gave an axiomatic characterization of conditional probabil- 
ities, but this definition provides no recipe for their calculation. Other issues also 
arise: In this setting, conditional probabilities are formalized as measurable func- 
tions that are defined only up to measure zero sets. Therefore, without additional 
assumptions, a conditional probability is not necessarily well-defined for any partic- 
ular value of the conditioning random variable. This has long been understood as 
a challenge for statistical applications, in which one wants to evaluate conditional 
probabilities given particular values for observed random variables. In this paper, 
we are therefore especially interested in situations where it makes sense to ask for 
the conditional distribution given a particular point. In particular, we focus on the 
case when conditional distributions are everywhere or almost everywhere continu- 
ous, and thus can be given a unique definition for individual points in the support 
of the underlying measure. As we will argue, this is necessary if there is to be any 
hope of conditioning being computable. 

Under certain conditions, such as when conditional densities exist, conditioning 
can proceed using the classic Bayes' rule; however, it may not be possible to compute 
the density of a computable distribution (if the density exists at all). The probability 
and statistics literature contains many ad-hoc techniques for calculating conditional 
probabilities in special circumstances, and this state of affairs motivated much work 
on constructive definitions (such as those due to Tjur [Tju74], [Tju75], [Tju80], 
Pfanzagl [Pfa79], and Rao [Rao88], [Rao05]), but this work has often not been 
sensitive to issues of computability. 

We recall the basics of the measure-theoretic approach to conditional distributions 
in Section 3, and in Section 4 we use notions from computable probability theory 
to consider the sense in which conditioning could be potentially computable. 



1.4. Other Related Work. Conditional probabilities for computable distributions 
on finite, discrete sets are clearly computable, but may not be efficiently so. In 
this finite discrete setting, there are already interesting questions of computational 
complexity, which have been explored by a number of authors through extensions 
of Levin's theory of average-case complexity [Lev86]. For example, under crypto- 
graphic assumptions, it is difficult to sample from the conditional distribution of 
a uniformly-distributed binary string of length n given its image under a one-way 
function. This can be seen to follow from the work of Ben-David, Chor, Goldre- 
ich, and Luby [BCGL92] in their theory of polynomial-time samplable distributions, 
which has since been extended by Yamakami [Yam99] and others. Extending these 
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complexity results to the more general setting considered here could bear on the 
practice of statistical AI and machine learning. 

Osherson, Stob, and Weinstein [OSW88] study learning theory in the setting of 
identifiability in the limit (see [Gol67] and [Put65] for more details on this setting) 
and prove that a certain type of "computable Bayesian" learner fails to identify 
the index of a (computably enumerable) set that is "computably identifiable" in the 
limit. More specifically, a so-called "Bayesian learner" is required to return an index 
for a set with the highest conditional probability given a finite prefix of an infinite 
sequence of random draws from the unknown set. An analysis by Roy [Roy 11] 
of their construction reveals that the conditional distribution of the index given 
the infinite sequence is an everywhere discontinuous function (on every measure one 
set), hence noncomputable for much the same reason as our elementary construction 
involving a mixture of measures concentrated on the rationals and on the irrationals 
(see Section 5). As we argue, it is more appropriate to study the operator when it is 
restricted to those random variables whose conditional distributions admit versions 
that are continuous everywhere, or at least on a measure one set. 

Our work is distinct from the study of conditional distributions with respect to 
priors that are universal for partial computable functions (as defined using Kol- 
mogorov complexity) by Solomonoff^ [Sol64], Zvonkin and Levin [ZL70], and Hutter 
[Hut07]. The computability of conditional distributions also has a rather different 
character in Takahashi's work on the algorithmic randomness of points defined using 
universal Martin-Lof tests [Tak08]. The objects with respect to which one is condi- 
tioning in these settings are typically computably enum,erable, but not computable. 
In the present paper, we are interested in the problem of computing conditional 
distributions of random variables that are com,putable, even though the conditional 
distribution may itself be noncomputable. 

In the most abstract setting, conditional probabilities can be constructed as 
Radon-Nikodym derivatives. In work motivated by questions in algorithmic ran- 
domness, Hoyrup and Rojas [HRll] study notions of computability for absolute 
continuity and for Radon-Nikodym derivatives as elements in L^, i.e., the space 
of integrable functions. They demonstrate that there are computable measures 
whose Radon-Nikodym derivatives are not computable as points in L^, but these 
counterexamples do not correspond with conditional probabilities of computable 
random variables. Hoyrup, Rojas and Weihrauch [HRWll] show an equivalence 
between the problem of computing general Radon-Nikodym derivatives as elements 
in L^ and computing the characteristic function of computably enumerable sets. 
However, conditional probabilities are a special case of Radon-Nikodym derivatives, 
and moreover, a computable element in L^ is not well-defined at points, and so 
is not ideal for statistical purposes. Using their machinery, we demonstrate the 
non-L^-computability of our main construction. But the main goal of our paper is 
to provide a detailed analysis of the situation where it makes sense to ask for the 
conditional probability at points, which is the more relevant scenario for statistical 
inference. 



ON THE COMPUTABILITY OF CONDITIONAL PROBABILITY 5 

1.5. Summary of Results. Following our presentation of computable probability 
theory and conditional probability in Sections 2 through 4, we provide our main 
positive and negative results about the computability of conditional probability, 
which we now summarize. 

In Proposition 5.1, we construct a pair (X, C) of computable random variables 
such that every version of the conditional probability P[C = 1|X] is discontinuous 
everywhere, even when restricted to a Px-measure one subset. (We make these 
notions precise in Section 4.) The construction makes use of the elementary fact that 
the indicator function for the rationals in the unit interval — the so-called Dirichlet 
function — is itself nowhere continuous. 

Because every function computable on a domain D is continuous on D, discon- 
tinuity is a fundamental barrier to computability, and so this construction rules 
out the possibility of a completely general algorithm for conditioning. A natural 
question is whether conditioning is a computable operation when we restrict the 
operator to random variables for which some version of the conditional distribution 
is continuous everywhere, or at least on a measure one set. 

Our central result. Theorem 6.7, states that conditioning is not a computable 
operation on computable random variables, even in this restricted setting. We 
construct a pair (X, N) of computable random variables such that there is a version 
of the conditional distribution P[N|X] that is continuous on a measure one set, but no 
version of P[N|X] is computable. (Indeed, every individual conditional probability 
fails to be even lower semicomputable on any set of sufficient measure.) Moreover, 
the noncomput ability of P[N|X] is at least as hard as the halting problem, in that 
if some oracle A computes P[N|X], then A computes the halting problem. The 
construction involves encoding the halting times of all Turing machines into the 
conditional distribution P[N|X], while ensuring that the joint distribution remains 
computable. 

In Theorem 7.6 we strengthen our central result by constructing a pair of com- 
putable random variables whose conditional distribution is noncomputable but has 
an everywhere continuous version with infinitely differentiable conditional proba- 
bilities. This construction proceeds by smoothing out the distribution constructed 
in Theorem 6.7, but in such a way that one can still compute the halting problem 
relative to the conditional distribution. 

Despite the noncomputability of conditioning in general, conditional distributions 
are often computable in practice. We provide some explanation of this phenome- 
non by characterizing several circumstances in which conditioning is a computable 
operation. Under suitable computability hypotheses, conditioning is computable in 
the discrete setting (Lemma 8.1) and where there is a conditional density (Corol- 
lary 8.8). 

We also characterize a situation in which conditioning is possible in the presence of 
noisy data, capturing many natural models in science and engineering. Let U, V and 
E be computable random variables, and suppose that Pe is absolutely continuous 
with a bounded computable density p^ and E is independent of U and V. We 
can think of U + E as the corruption of an idealized measurement U by independent 
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source of additive error E. In Corollary 8.9, we show that the conditional distribution 
P[(U, V) I U + E] is computable (even if P[(U, V) | U] is not). Finally, we discuss how 
symmetry can contribute to the computability of conditional distributions. 

2. Computable Probability Theory 

We now give some background on computable probability theory, which will en- 
able us to formulate our results. The foundations of the theory include notions 
of computability for probability measures developed by Edalat [Eda96], Weihrauch 
[Wei99], Schroeder [SchOTb], and Gacs [Gac05]. Computable probability theory it- 
self builds off notions and results in computable analysis. For a general introduction 
to this approach to real computation, see Weihrauch [WeiOO] , Braverman [Bra05] or 
Braverman and Cook [BC06]. 

2.1. Computable and C.e. Reals. We first recall some elementary definitions 
from computability theory (see, e.g. Rogers [Rog87, Ch. 5]). We say that a set of 
natural numbers (potentially in some correspondence with, e.g., rationals, integers, 
or other finitely describable objects with an implicit enumeration) is computably 
enumerable (c.e.) when there is a computer program that outputs every element 
of the set eventually. We say that a sequence of sets {-B„} is c.e. uniformly in n 
when there is a computer program that, on input n, outputs every element of Bn 
eventually. A set is co-c.e. when its complement is c.e. (and so the (uniformly) 
computable sets are precisely those that are both (uniformly) c.e. and co-c.e). 

We now recall basic notions of computability for real numbers (see, e.g., [WeiOO, 
Ch. 4.2] or [Nie09, Ch. 1.8]). We say that a real r is a c.e. real when the set 
of rationals {q £ Q : q < r} is c.e. Similarly, a co-c.e. real is one for which 
{(? G Q : g > r} is c.e. (C.e. and co-c.e. reals are sometimes called left-c.e. and 
right-c.e. reals, respectively.) A real r is computable when it is both c.e. and co- 
c.e. Equivalently, a real is computable when there is a program that approximates 
it to any given accuracy (e.g., given an integer k as input, the program reports a 
rational that is within 2~ of the real). A function / : N — )• M is lower (upper) 
semicomputable when f{n) is a c.e. (co-c.e.) real, uniformly in n (or more precisely, 
when the rational lower (upper) bounds of f{n) are c.e. uniformly in n). The 
function / is computable if and only if it is both lower and upper semicomputable. 

2.2. Computable Metric Spaces. Computable metric spaces, as developed in 
computable analysis [Hem02], [Wei93] and effective domain theory [JB97], [EH98], 
provide a convenient framework for formulating results in computable probability 
theory. For consistency, we largely use definitions from [HR09a] and [GHRIO]. 
Additional details about computable metric spaces can also be found in [WeiOO, 
Ch. 8.1] and [Gac05, §B.3]. 

Definition 2.1 (Computable metric space [GHRIO, Def. 2.3.1]). A computable 

metric space is a triple {S, 5, T>) for which (5 is a metric on the set S satisfying 

(1) (5,(5) is a complete separable metric space; 
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(2) T> = {sjjjgN is an enumeration of a dense subset of 5", called ideal points; 
and, 

(3) the real numbers 6{si,Sj) are computable, uniformly in i and j. 

Let B{si, Qj) denote the ball of radius qj centered at Sj. We call 

^S ■■= {B{si,qj) : Si^V, Qj e Q, qj > 0} (1) 

the ideal balls of S, and fix the canonical enumeration of them induced by that of 
V and Q. 

For example, the set {0,1} is a computable metric space under the discrete met- 
ric, characterized by (5(0, 1) = 1. Cantor space, the set {0, 1}°° of infinite binary 
sequences, is a computable metric space under its usual metric and the dense set of 
eventually constant strings (under a standard enumeration of finite strings). The 
set M of real numbers is a computable metric space under the Euclidean metric with 
the dense set Q of rationals (under its standard enumeration) . 

We let Bs denote the Borel cr-algebra on a metric space S, i.e., the u-algebra 
generated by the open balls of 5. In this paper, measurable functions will always 
be with respect to the Borel a-algebra of a metric space. 

Definition 2.2 (Computable point [GHRIO, Def. 2.3.2]). Let {S,6,V) be a com- 
putable metric space. A point x £ S is computable when there is a program that 
enumerates a sequence {xi} in T> where 6{xi,x) < 2~* for all i. We call such a 
sequence {xj} a representation of the point x. 

Remark 2.3. A real q G M is computable (as in Section 2.1) if and only if a is 
a computable point of M (as a computable metric space). Although most of the 
familiar reals are computable, there are only countably many computable reals, and 
so almost every real is not computable. 

The notion of a c.e. open set (or Sj class) is fundamental in classical computability 
theory, and admits a simple definition in an arbitrary computable metric space. 

Definition 2.4 (C.e. open set [GHRIO, Def. 2.3.3]). Let (3,6,1)) be a computable 
metric space with the corresponding enumeration {i?i}jgN of the ideal open balls 
^s- We say that C/ C S* is a c.e. open set when there is some c.e. set i? C N such 

that C/ = ^6E^i• 
Note that the class of c.e. open sets is closed under computable unions and finite 
intersections. 

A computable function can be thought of as a continuous function whose local 
modulus of continuity is witnessed by a program. It is important to consider the 
computability of partial functions, because many natural and important random 
variables are continuous only on a measure one subset of their domain. 

Definition 2.5 (Computable partial function [GHRIO, Def. 2.3.6]). Let {S,6s,Vs) 
and (T,5t,T>t) be computable metric spaces, the latter with the corresponding 
enumeration {Bn\nefi of the ideal open balls I^t- A function f : S ^- T is said to 
be computable on R C S when there is a computable sequence {Un}neN of c.e. 
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open sets Un ^ S such that f^'^lBn] n i? = [/„ n i? for ah n G N. We cah such a 
sequence {C/n}neN a witness to the computabihty of /. 

In particular, if / is computable on i?, then the inverse image of c.e. open sets 
are c.e. open (in R) sets, and so we can see computabihty as a natural restriction 
on continuity. 

Remark 2.6. Let S and T be computable metric spaces. If / : S" — t- T is computable 
on some subset R '^ S, then for every computable point x € R, the point /(x) is also 
computable. One can show that / is computable on R when there is a program that 
uniformly transforms representations of points in R to representations of points in 
S. (For more details, see [HR09a, Prop. 3.3.2].) 

Remark 2.7. Suppose that / : 5 — )• T is computable on i? C 5 with {Un}neN 
a witness to the computabihty of /. One can show that there is an effective Gs 
set R' ^ R and a function f : S ^ T such that /' is computable on R' , the 
restriction of /' to R and / are equal as functions, and {C/n}nGN is a witness to the 
computabihty of /'. Furthermore, a G^-code for R' can be chosen uniformly in the 
witness {Un}nen- One could consider such an /' to be a canonical representative of 
the computable partial function / with witness {Un}nen- Note, however, that the 
G^-set chosen depends not just on /, but also on the witness {C/n}neN- In particular, 
it is possible that two distinct witnesses to the computabihty of / could result in 
distinct G^-sets. 

2.3. Computable Random Variables. Intuitively, a random variable maps an 
input source of randomness to an output, inducing a distribution on the output 
space. Here we will use a sequence of independent fair coin flips as our source of 
randomness. We formalize this via the probability space ({0, 1}°°, ^, P), where 
{0, 1}°° is the product space of infinite binary sequences, ^ is its Borel cr-algebra 
(generated by the set of basic clopen cylinders extending each finite binary sequence) , 
and P is the product measure of the uniform distribution on {0, 1}. Henceforth we 
will take ({0, 1}°°, ^, P) to be the basic probability space, unless otherwise stated. 

For a measure space (0,§^,;u), a set E € ^ is a //-null set when fiE = 0. More 
generally, for p G [0,oo], we say that E is a //-measure p set when fj,E = p. A 
relation between functions on 0, is said to hold //-almost everywhere (abbreviated 
/i-a.e.) if it holds for all a; G il outside of a //-null set. When // is a probability 
measure, then we may instead say that the relation holds for /i-almost all oj (abbre- 
viated //-a. a.). We say that an event E ^'^ occurs //-almost surely (abbreviated 
/i-a.s.) when ^E = 1. In each case, we may drop the prefix /i- when it is clear from 
context (in particular, when it holds of P). 

We will use a SANS SERIF font for random variables. 

Definition 2.8 (Random variable and its distribution). Let 5 be a computable 
metric space. A random variable in 5 is a function X : {0, 1}°° — )• S that is 
measurable with respect to the Borel cr-algebras of {0, 1}°° and S. For a measurable 
subset A C 5, we let {X G A} denote the inverse image 

yr\A] = {w G {0, 1}°^ : X{u) G A], (2) 
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and for x £ S we similarly define the event {X = x}. We will write Px for the 
distribution of X, which is the measure on S defined by Px( ■) '■= P{X G • }. 

Definition 2.9 (Computable random variable). Let 5" be a computable metric 
space. Then a random variable X in 5 is a computable random variable when 
X is computable on some P-measure one subset of {0, 1}°°. 

More generally, for a probability measure fi and function / between computable 
metric spaces, we say that / is ^u-almost computable when it is computable on 
a //-measure one set. (See [HR09a] for further development of the theory of almost 
computable functions.) 

Intuitively, X is a computable random variable when there is a program that, 
given access to an oracle bit tape uj £ {0, 1}°°, outputs a representation of the point 
X(a;) (i.e., enumerates a sequence {xj} in T> where 6{xi,X{uj)) < 2~* for all i), for 
all but a measure zero subset of bit tapes co G {0, 1}°°. 

Even though the source of randomness is a sequence of discrete bits, there are 
computable random variables with continuous distributions, such as a uniform ran- 
dom variable (gotten by subdividing the interval according to the random bittape) 
or an i.i.d. -uniform sequence (by splitting up the given element of {0, 1}°° into count- 
ably many disjoint subsequences and dovetailing the constructions). (For details, 
see [FRIO, Ex. 3, 4].) All of the standard distributions (standard normal, uniform, 
geometric, exponential, etc.) found in probability textbooks, as well the transfor- 
mations of these distributions by computable (or almost computable) functions, are 
easily shown to be computable distributions. 

It is crucial that we consider random variables that are computable only on a P- 
measure one subset of {0, 1}°°. Consider the following example: For a real a £ [0, 1], 
we say that a binary random variable X : {0, 1}°° — t- {0, 1} is a Bernoulli(a) 
random variable when Px{l} = a. There is a Bernoulli(2) random variable that is 
computable on all of {0, 1}°°, given by the program that simply outputs the first bit 
of the input sequence. Likewise, when a is dyadic (i.e., a rational with denominator 
a power of 2), there is a Bernoulli(a) random variable that is computable on all of 
{0, 1}°°. However, this is not possible for any other choices of a (e.g., o). 

Proposition 2.10. Let a G [0,1] be a nondyadic real. Every Bernoulli(a) random 
variable X : {0, 1}°° — ?■ {0, 1} is discontinuous, hence not computable on all of 
{0,1}-. 

Proof. Assume X is continuous. Let Zq := X^^(O) and Zi := X~^(l). Then 
{0,1}°° = Zq U Zi, and so both are closed (as well as open). The compactness 
of {0, 1}°° implies that these closed subspaces are also compact, and so Zq and Zi 
can each be written as the finite disjoint union of clopen basis elements. But each 
of these elements has dyadic measure, hence their sum cannot be either a or 1 — a, 
contradicting the fact that P(Zi) = l — P(Zo) = a. D 

On the other hand, for an arbitrary computable a G [0, 1], consider the random 
variable X^ given by Xq,(x) = 1 if YliLo Xi2~'^~^ < a and otherwise. This construc- 
tion, due to [Man73], is a Bernoulli(Q;) random variable and is computable on every 
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point of {0, 1}°° other than a binary expansion of a. Not only are these random 
variables computable, but they can be shown to be optimal in their use of input bits, 
via the classic analysis of rational- weight coins by Knuth and Yao [KY76]. Hence 
it is natural to admit as computable random variables those measurable functions 
that are computable only on a P-measure one subset of {0, 1}°°, as we have done. 

2.4. Computable Probability Measures. In this section, we introduce the class 
of computable probability measures and describe their connection with computable 
random variables. 

Let (Sjdsj'Ds) be a computable metric space, and let 13{S) denote its Borel sets. 
We will denote by M.{S) the set of (Borel) measures on S and by M.i{S) the subset 
which are probability measures. Consider the subset T>p C J\Ai{S) comprised of 
those probability measures that are concentrated on a finite subset of P5 and where 
the measure of each atom is rational, i.e., v G 2?p if and only \i v = X]j=i Qi^u for 
some rationals Qj > such that Yli=i Qi = ^ ^^d some points ti £ Vs, where St^ 
denotes the Dirac delta on ti. Gacs [Gac05, §B.6.2] shows that Dp is dense in the 
Prokhorov metric 5p given by 

6p{fi, u) := inf {e > : V^ G B{S), n{A) < u{A^) + e} , (3) 

where 

A':={pGS -.BqG A, 6sip, q) < e} = [jp^A Be{p) (4) 

is the e- neighborhood of A and B^{p) is the open ball of radius e about p. Moreover, 
{Mi{S),5p^Vp) is a computable metric space. (See also [HR09a, Prop. 4.1.1].) We 
say that ^ G M.i{S) is a computable (Borel) probability measure when ^u is a 
computable point in Mi{S) as a computable metric space. Note that the measure 
P on {0, 1}°° is a computable probability measure. 

We can characterize the class of computable probability measures in terms of the 
computability of the measure of open sets. 

Theorem 2.11 ([HR09a, Thm. 4.2.1]). Let {S,5s,T>s) be a computable metric 
space. A probability measure fi G Mi{S) is computable if and only if the measure 
/i(A) of a c.e. open set A <^ S is a c.e. real, uniformly in A. 

Definition 2.12 (Computable probability space [GHRIO, Def. 2.4.1]). A com- 
putable probability space is a pair (S, fi) where S is a computable metric space 
and fi is a computable probability measure on 5. 

Let (S, /i) be a computable probability space. We know that the measure of a 
c.e. open set ^ is a c.e. real, but is not in general a computable real. On the other 
hand, if A is a decidable subset (i.e., 5\^ is c.e. open) then ii{S\A) a c.e. real, and 
therefore, by the identity n{A) + fj.{S \ vl) = 1, we have that n{A) is a computable 
real. In connected spaces, the only decidable subsets are the empty set and the 
whole space. However, there exists a useful surrogate when dealing with measure 
spaces. 
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Definition 2.13 (Almost decidable set [GHRIO, Def. 3.1.3]). Let (5,/x) be a com- 
putable probability space. A (Borel) measurable subset j4 C S" is said to be /x- 
almost decidable when there are two c.e. open sets U and V such that U C. A 
andV CS\A and n{U) + fi{V) = 1. 

The following is immediate. 

Lemma 2.14 ([GHRIO, Prop. 3.1.1]). Let (S, /u) be a computable probability space, 
and let A be ^-almost decidable. Then fJ,{A) is a computable real. 

While we may not be able to compute the probability measure of ideal balls, we 
can compute a new basis of ideal balls for which we can. (See also Bosserhoff [Bos08, 
Lem. 2.15].) 

Lemma 2.15 ([GHRIO, Thm. 3.1.2]). Let {S,fi) be a computable probability space, 
and let Vs be the ideal points of S with standard enumeration {djjjgN- There is a 
computable sequence {rjjjgN of reals, dense in the positive reals, such that the balls 
{B{di,rj)}ij^fq form a basis of fi-alm,ost decidable sets. 

We now show that every c.e. open set is the union of a computable sequence of 
almost decidable subsets. 

Lemma 2.16 (Almost decidable subsets). Let (S*, ^) be a computable probability 
space and let V be a c.e. open set. Then, uniformly in V, we can compute a se- 
quence of ^-almost decidable sets {VfcjfceN such that, for each k, Vk C Vk+i, and 

Proof. Let {-BfcjfceN be a standard enumeration of the ideal balls of S where Bk = 
B{dm^,qij^), and let E^ C N be a c.e. set such that V = \JkeE ^k- Let {B{di, J'jOlijeN 
form a basis of /i- almost decidable sets, as shown to be computable by Lemma 2.15. 
Consider the c.e. set 

Fk ■■= {(i,j) : Ss{di,djnJ+rj < qij. (5) 

Because {djjjeN is dense in S and {rjjjgN is dense in the positive reals we have 
for each fc G N that -B^ = IJ/^ -,g^ B(di,rj). In particular this implies that the set 
F '■= Ufce-B-^fc is a c.e. set with V = [j{i,j)eF ^idi,rj). Let {(i„,j„)}„eN be a c.e. 
enumeration of F and let V^ := Un<fc ^(^«n''^in)' which is almost decidable. By 
construction, for each k, Vk ^ Vk+i, and UfceN ^k = ^- D 

Using the notion of an almost decidable set, we have the following characterization 
of computable measures. 

Corollary 2.17. Let S be a computable metric space and let /x G Mi{S) be a 
probability measure on S. Then /x is computable if the measure fJ.{A) of every fi- 
almost decidable set A is a computable real, uniformly in A. 

Proof. Let F be a c.e. open set of S. By Theorem 2.11, it suffices to show that 
n{V) is a c.e. real, uniformly in V. By Lemma 2.16, we can compute a nested 
sequence {Vfej/teN of ^-almost decidable sets whose union is V. Because V is open, 
fj-{V) = sup k^f^ fi{Vk). By hypothesis, /u(Va;) is a computable real for each k, and so 
the supremum is a c.e. real, as desired. D 
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Computable random variables have computable distributions. 

Proposition 2.18 ([GHRIO, Prop. 2.4.2]). Let X be a computable random variable 
in a computable metric space S. Then its distribution is a computable point in the 
computable metric space M.i{S). 

On the other hand, one can show that given a computable point ^ in A^i(S'), one 
can construct an i.i.d.-/i sequence of computable random variables in S. 

3. Conditional Probabilities and Distributions 

Informally, the conditional probability of an event B given an event A is the like- 
lihood that the event B occurs, given the knowledge that the event A has occurred. 

Definition 3.1 (Conditioning with respect to a single event). Let 5 be a measur- 
able space and let /i G Aii[S) be a probability measure on S. Let A,B ^ S he 
measurable sets, and suppose that ^{A) > 0. Then the conditional probability 
of B given A, written fj,{B\A), is defined by 

Note that for any fixed measurable set A Q S with fJ,{A) > 0, the function fi(-\A) 
is a probability measure. This notion of conditioning is well-defined precisely when 
/i(A) > 0, and so is insufficient for defining the conditional probability given the 
event that a continuous random variable takes a particular value, as such an event 
has measure zero. 

We will often be interested in the case where B and A are measurable sets of the 
form {Y G D} and {X € C}. In this case, we define the abbreviation 

F{Y eD\XeC}:=P{{Y eD}\{XeC}). (7) 

Again, this is well-defined when P{X G C} > 0. As a special case, when C = {x} 
is an atom, we obtain the notation 

P{YgZ)|X = x}. (8) 

The modern formulation of conditional probability is due to Kolmogorov [Kol33], 
and gives a consistent solution to the problem of conditioning on the value of general 
(and in particular, continuous) random variables. (See Kallenberg [Kal02, Chp. 6] 
for a rigorous treatment.) 

Definition 3.2 (Conditioning with respect to a random variable). Let X and Y be 
random variables in measurable spaces S and T, respectively, and let -B C T be 
a measurable set. Then a measurable function P[Y € -B|X] from S to [0, 1] is (a 
version of) the conditional probability of Y £ B given X when 

P{X £ A, Y e B} = [ P[Y e B\X] dPx (9) 

Ja 

for all measurable sets A <^ S. Moreover, P[Y G B\X] is uniquely defined up to a Px- 

null set (i.e., almost surely unique) and so we may sensibly refer to the conditional 

probability when we mean a generic element of the equivalence class of versions. 
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Remark 3.3. It is also common to define a conditional probability given X as a 
cr(X)-measurable random variable P that is the Radon-Nikodym derivative of the 
measure P(- n {Y G B}) with respect to P, both viewed as measures on the a- 
algebra cr(X). (See Kallenberg [Kal02, Chp. 6] for such a construction.) However 
there is a close relationship between these two definitions: In particular, there exists 
a measure function h from S to [0, 1] such that P = h(X) a.s. We have taken h to be 
our definition of the conditional probability as it is more natural to have a function 
from S in the statistical setting. (We take the same tact when defining conditional 
distributions.) In Propositions 6.5 and 6.10, we demonstrate that both definitions 
of conditional probability admit noncomputable versions. 

It is natural to want to consider not just individual conditional probabilities but 
the entire conditional distribution P[Y G • |X] of Y given X. In order to define 
conditional distributions, we first recall the notion of a probability kernel. (For 
more details, see, e.g., [Kal02, Ch. 3, 6].) 

Definition 3.4 (Probability kernel). Let S and T be measurable spaces. A function 
K : S X Bt -^ [0, 1] is called a probability kernel (from S to T) when 

(1) for every s £ S, the function k(s, • ) is a probability measure on T; and 

(2) for every B £ Bt, the function k{- ,B) is measurable. 

It can be shown that k is a probability kernel from S" to T if and only if s i— )• k(s, • ) 
is a measurable map from S to Aii{T) [Kal02, Lem. 1.40]. 

Definition 3.5 (Conditional distribution). Let X and Y be random variables in 
measurable spaces S and T, respectively. A probability kernel k is called a (regular) 
version of the conditional distribution P[Y G • |X] when it satisfies 

P{X G A, Y G 5} = / k(x, B) Px(f^x), (10) 

J A 

for all measurable sets A C 5 and B (IT. 

We will simply write P[Y|X] in place of P[Y G • |X]. 

Definition 3.6. Let /i be a measure on a topological space S with open sets S. 
Then the support of /i, written supp(/u), is defined to be the set of points x £ S 
such that all open neighborhoods of x have positive measure, i.e., 

supp(/i) := {x £ S : WB £ S {x £ B =^ fi{B) > 0)}. (11) 

Given any two versions of a conditional distribution, they need only agree almost 
everywhere. However, they will agree at points of continuity in the support: 

Lemma 3.7. LetX andY be random variables in topological spaces S andT, respec- 
tively, and suppose that ki,K2 are versions of the conditional distribution P[Y|X]. 
Let X £ S be a point of continuity of both of the maps x i— )• kj(x, • ) for i = 1,2. If 
X £ supp(Px), then «;i(x, • ) = K2{x, ■). 

Proof. Fix a measurable set >1 C y and define <?( • ) := ki{- ,A) — K2{- ,A). We know 
that g = Px-a.e., and also that g is continuous at x. Assume, for the purpose of 
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contradiction, that g{x) = e > 0. By continuity, there is an open neighborhood B 
of X, such that g{B) G (|, ^). But x G supp(Px), hence Px(-B) > 0, contradicting 
g = Px-a.e. D 

When conditioning on a discrete random variable, i.e. one whose image is a 
discrete set, it is well known that a version of the conditional distribution can be 
built by elementary conditioning with respect to single events. 

Lemma 3.8. Let X and Y be random variables on measurable spaces S and T, 
respectively. Suppose that X is a discrete random variable with support i? C 5, and 
let V be an arbitrary probability measure on T. Define the function k : Sx Bt — )• [0,1] 
by 

k{x,B):=P{Y £ B\X = x} (12) 

for all x £ R and n{x, • ) = i/( • ) for x ^ R. Then k is a version of the conditional 
distribution P[Y|X]. 

Proof. The function k, given by 

k{x,B) :=P{y £B\X = x} (13) 

for all X £ R and k(x, • ) = z^( • ) for x i?, is well-defined because P{X = x} > for 
all X £ R, and so the right hand side of Equation (13) is well-defined. Furthermore, 
P{X £ R} = 1 and so n is characterized by Equation (13) for Px-almost all x. 
Finally, for all measurable sets A <^ S and S C T, we have 

f K{x,B)Pxidx)= Y^ P{Yg B |X = x}P{X = x} (14) 

"^^ xeRnA 

= Yj Pi^G B, X = x} (15) 

xeRnA 
= P{Y £B, X£A}, (16) 

and so K is a version of the conditional distribution P[Y|X]. D 

4. Computable Conditional Probabilities and Distributions 

We begin by demonstrating the computability of elementary conditional proba- 
bility given positive-measure events that are almost decidable. We then return to 
the abstract setting and lay the foundations for the remainder of the paper. 

Lemma 4.1 ([GHRIO, Prop. 3.1.2]). Let (5*,//) be a computable probability space 
and let A be an almost decidable subset of S satisfying fJ-{A) > 0. Then fx{ ■ \A) is a 
computable probability measure. 

Proof. By Corollary 2.17, it suffices to show that ^^ . .^ ' is computable for an al- 
most decidable set B. But then i? n ^ is almost decidable and so its measure, the 
numerator, is a computable real. The denominator is likewise the measure of an 
almost decidable set, hence a computable real. Finally, the ratio of two computable 
reals is computable. D 
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Conditioning in the abstract setting is more involved. Having defined the abstract 
notion of conditional probabilities and conditional distributions in Section 3, we 
now define notions of coniputability for these objects, starting with conditional 
distributions. 

Definition 4.2 (Computable probability kernel). Let S and T be computable metric 
spaces and let k : 5" x Bt -^ [0, 1] be a probability kernel from S to T. Then we say 
that K is a computable (probability) kernel when the map (j),^ : S ^ Aii{T) 
given by </>k(s) := k(s, • ) is a computable function. Similarly, we say that k is 
computable on a subset D (^ S when (p^^ is computable on D. 

The following correspondence will allow us to derive other characterizations of 
computable probability kernels. The proof, however, will use that fact that given 
a sequence of ideal balls B{di, gi), . . . , B{dn, Qn) and an ideal ball B{d* , q*) we can 
semi-decide when B{d*,q*) C Ui<n -^('^«' '?«) (uniformly in the indexes of the ideal 
balls). 

Henceforth, we will make the assumption that our computable metric spaces have 
the property. This holds for all the specific spaces (M'^jjO, 1}, {0, 1}°°, N, etc.) that 
we consider. 

Proposition 4.3. Let (T, 6,T>) be a computable metric space. Let T be the collection 
of sets of the form 

Pa,, = {/i G Mi{T) : /x(A) > q] (17) 

where A is a c.e. open subset ofT and q is a rational. Then the elements of T are 
c.e. open subsets of Mi{T) uniformly in A and q. 

Proof. Note that T is a subbasis for the weak topology induced by the Prokhorov 
metric (see [SchOTa, Lem. 3.2]). 

Let P = PA,q for a rational q and c.e. open subset A. 

We can write A = IJneN ^(dn, fn) for a sequence (computable uniformly in A) of 
ideal balls in T with centers dn S ^t and rational radii rn- Define 
^m := [^n<mB{dn,rn). Then A^ C A^+i and A = U^^m- Writing 

Pm■■={^leMl{T) : ii{Arn)>q], (18) 

we have P = |J^ Pm. In order to show that P is c.e. open uniformly in q and j4, it 
suffices to show that Pm is c.e. open, uniformly in g, m and A. 

Let Vp be the ideal points of A^i(r) (see Section 2.4), let v S Pp, and let R be 
the finite set on which it concentrates. Gacs [Gac05, Prop. B.17] characterizes the 
ideal ball E centered at u 

lJi{C') > v{C) - e (19) 

for all subsets C C R, where C^ = Uiec ^(^' ^)- 

It is straightforward to show that E C P„ if and only if v{Cm) > 9 + e, where 

Cm-={teR: B{t,e)(^Ara}. (20) 
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Note that Cm is a decidable subset of R (uniformly in m, A, and E) and that v{Cm) 
is a rational and so we can decide whether E C P„, showing that Pm is c.e. open, 
uniformly in g, m, and A. D 

Recall that a lower semicomputable function from a computable metric space to 
[0, 1] is one for which the preimage of {q, 1] is c.e. open, uniformly in rationals q. 
Furthermore, we say that a function / from a computable metric space S to [0, 1] 
is lower semicomputable on D ^ S when there is a uniformly computable sequence 
{[/gIggQ of c.e. open sets such that 

r^[{q^\\r^D = u,r^D. (21) 

We can also interpret a computable probability kernel k as a computable map 
sending each c.e. open set A C T to a lower semicomputable function k{- ^A). 

Lemma 4.4. Let S and T he computable metric spaces, let k he a probability kernel 
from S to T , and let D (1 S. If (p^ is computable on D then k{- ,A) is lower 
semicomputable on D uniformly in the c.e. open set A. 

Proof. Let q G (0, 1) be a rational, and let A be a c.e. open set. Define / := (q, 1]. 
Then k{ ■,A)-'^ [I] = (j)'^ [P] , where 

P := {fi e Mi{T) : fi{A)>q}. (22) 

This is an open set in the weak topology induced by the Prokhorov metric (see 
[Sch07a, Lem. 3.2]), and by Lemma 4.3, P is c.e. open. 

By the computability of (j)^, there is a c.e. open set V, uniformly computable in 
q and A such that 

^{■,A)-'[i]nD = <p-'[P]nD = vnD, (23) 

and so k( • , A) is lower semicomputable on D, uniformly in A. D 

In fact, when ^ C T is a decidable set (i.e., when A and T\A are both c.e. open), 
k{- ,A) is a computable function. 

Corollary 4.5. Let S and T be computable metric spaces, let n be a probability 
kernel from S to T computable on a subset D <^ S, and let A C T be a decidable 
set. Then k(- ,A) : S ^ [0, 1] is computable on D. 

Proof. By Lemma 4.4, k(- ,A) and k( • ,T\A) are lower semicomputable on D. But 
k{x, A) = 1 — k{x, T \ A) for all x £ D, and so k( • , A) is upper semicomputable, 
and therefore computable, on D. D 

Although a conditional distribution may have many different versions, their com- 
putability as probability kernels does not differ (up to a change in domain by a null 

set). 

Lemma 4.6. Let X and Y be computable random variables on computable metric 
spaces S and T, respectively, and let k be a version of a conditional distribution 
P[Y|X] that is computable on some Y'x-measure one set. Then any version o/P[Y|X] 
is also computable on some Px-measure one set. 
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Proof. Let k be a version that is computable on a Px-measure one set D, and let k' 
be any other version. Then Z := {s G S : k{s, ■ ) / k'(s, • )} is a Px-null set, and 
K = k' on D\ Z. Hence k' is computable on the Px-measure one set D\ Z. D 

This observation motivates the following definition of computability for condi- 
tional distributions. 

Definition 4.7 (Computable conditional distributions). Let X and Y be computable 
random variables on computable metric spaces S and T, respectively, and let k be 
a version of the conditional distribution P[Y|X]. We say that P[Y|X] is computable 
when K is computable on a Px-measure one subset of S. 

Note that this definition is analogous to our Definition 2.9 of a computable random 
variable. In fact, if k is a version of a computable conditional distribution P[Y|X], 
then k(X, • ) is a (c7(X)-measurable) computable random probability measure (i.e., 
a probability-measure- valued random variable). 

Intuitively, a conditional distribution is computable when for some (and hence for 
any) version k there is a program that, given as input a representation of a point 
s £ S, outputs a representation of the measure (pnis) = n{s, ■ ) for Px-almost all 
inputs s. 

Suppose that P[Y|X] is computable, i.e., there is a version n for which the map 
0K is computable on some Px-measure one set 5" C S. As noted in Definition 4.2, 
we will often abuse notation and say that k is computable on S' . The restriction 
of (pK to S' is necessarily continuous (under the subspace topology on S'). We say 
that K is Px-almost continuous when the restriction of (p^ to some Px-measure 
one set is continuous. Thus when P[Y|X] is computable, there is some Px-almost 
continuous version. 

We will need the following lemma in the proof of Lemma 6.3, but we postpone 
the proof until Section 8.2. 

Lemma 4.8. Let X and Y be random variables on metric spaces S and T , respec- 
tively, and let R C S. If a conditional density Px\\{x\y) ofX given Y is continuous 
on Rx T, positive, and bounded, then there is a version of the conditional distribu- 
tion P[Y|X] that is continuous on R. In particular, if R is a Px-measttre one subset, 
then there is a Y'x-alm,ost continuous version. 

We now define computability for conditional probability. 

Definition 4.9 (Computable conditional probability). Let X and Y be computable 
random variables in computable metric spaces S and T, respectively, and let -B C T 
be a Py-almost decidable set. We say that the conditional probability P[Y G -B|X] 
is computable when it is computable (when viewed as a function from S to [0, 1]) 
on a Px-measure one set. 

In Section 5 we describe a pair of computable random variables X, Y for which 
P[Y|X] is not computable, by virtue of no version being Px-almost continuous. In 
Section 6 we describe a pair of computable random variables X, Y for which there is 
a Px-almost continuous version of P[Y|X], but still no version that is computable 
on a Px-measure one set. 
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5. Discontinuous Conditional Distributions 

Our study of the computability of conditional distributions begins with a descrip- 
tion of the following roadblock: a conditional distribution need not have any version 
that is continuous or even almost continuous (in the sense described in Section 4). 
This rules out computability. 

We will work with the standard effective presentations of the spaces M, N, {0, 1}, 
as well as product spaces thereof, as computable metric spaces. For example, we 
will use M under the Euclidean metric, along with the ideal points Q under their 
standard enumeration. 

Recall that a random variable C is a Bernoulli(p) random variable, or equiva- 
lently, a p-coin, when P{C = 1} = 1 — P{C = 0} = p. We call a ^-coin a fair coin. 
A random variable N is geometric when it takes values in N = {0, 1,2,...} and 
satisfies 

P{N = n} = 2-("+^\ forneN. (24) 

A random variable that takes values in a discrete set is a uniform random variable 
when it assigns equal probability to each element. A continuous random variable U 
on the unit interval is uniform when the probability that it falls in the subinterval 
[i, r] is r — i. It is easy to show that the distributions of these random variables are 
computable. 

Let C, U, and N be independent computable random variables, where C is a fair 
coin, U is a uniform random variable on [0, 1], and N is a geometric random variable. 
Fix a computable enumeration {rjjjgpj of the rational numbers (without repetition) 
in (0, 1), and consider the random variable 

X:=|"- 'f^ = l^ (25) 

lr]\|, otherwise. 

It is easy to verify that X is a computable random variable. 

Proposition 5.1. Every version of the conditional distribution P[C|X] is discontin- 
uous everywhere on every Px -measure one set. In particular, every version of the 
conditional probability P[C = 0|X] is discontinuous everywhere on every 'Px-vaeasure 
one set. 

Proof. Note that P{X rational} = 2 and, furthermore, P{X = r^} = 2^+2 > 0. 
Therefore, any two versions of the conditional distribution P[C|X] must agree on all 
rationals in [0, 1]. In addition, because Py ^ Px, i.e., 

P{U G^} >0 ^ P{Xg^} >0 (26) 

for all measurable sets ^ C [0, 1], any two versions must agree on a Lebesgue- measure 
one set of the irrationals in [0, 1]. An elementary calculation shows that 

P{C = I X rational} = 1, (27) 

while 

P{C = I X irrational} = 0. (28) 
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Therefore, all versions k of P[C|X] satisfy, for Px-almost all x, 

jl, a; rational; 
k(x,{0}) = <^ . ^. , (29) 

I U, X irrational, 

where the right hand side, considered as a function of x, is the so-called Dirichlet 
function, a nowhere continuous function. 

Suppose some version of the conditional probability x i— )■ «;(x, {0}) were continu- 
ous at a point y on a Px-nieasure one set R. Then there would exist an open interval 
/ containing y such that the image oi Ir\R contains or 1, but not both. However, 
R must contain all rationals in / and almost all irrationals in /. Furthermore, the 
image of every rational in / n i? is 1, and the image of almost every irrational in 
/ n i? is 0, a contradiction. □ 

Discontinuity is a fundamental obstacle to computability, but many conditional 
probabilities do admit continuous versions, and we can focus our attention on such 
settings, to rule out this objection. In particular, we might hope to be able to 
compute a conditional distribution of a pair of computable random variables when 
there is some version that is almost continuous or even continuous. However we will 
show that even this is not possible in general. 

6. NONCOMPUTABLE ALMOST CONTINUOUS CONDITIONAL DISTRIBUTIONS 

In this section, we construct a pair of random variables (X, N) that is computable, 
yet whose conditional distribution P[N|X] is not computable, despite the existence 
of a Px-almost continuous version. 

Let Mn denote the nth Turing machine, under a standard enumeration, and let 
/i : N — 7- NU{oo} be the map given by h{n) = oo if M„ does not halt (on input 0) and 
h{n) = /c if Mn halts (on input 0) at the /cth step. Let i? = {n G N : h{n) < oo} be 
the indices of those Turing machines that halt (on input 0). We now use h to define 
a pair (N,X) of computable random variables such that H is computable from the 
conditional distribution of N given X. 

Let N be a computable geometric random variable, C a computable g-coin, and 
U and V both computable uniform random variables on [0,1], all mutually inde- 
pendent. Let [xj denote the greatest integer y < x. Note that [2 VJ is uniformly 
distributed on {0, 1, 2, . . . , 2'^ — 1}. Consider the derived random variables 

X. . !iq,±£M ,30, 

for k £N. Almost surely, the limit X^o := linifc^oo ^k exists and satisfies Xqo = V a.s. 
Finally, we define X := X/j^n). 

Proposition 6.1. The random variable X is computable. 

Proof. Because U and V are computable and a.s. nondyadic, their (a.s. unique) 
binary expansions {U„ : n G N} and {V„ : n £ N} are themselves computable 
random variables in {0, 1}, uniformly in n. 
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For each A; > 0, define the random variable 



Di 



Vfc, 


/i(N) > k; 


c, 


/i(N) = k; 


^k-h{H)-l, 


/i(N) <k. 



(31) 



By simulating M^ for k steps, we can decide whether /i(N) is less than, equal to, or 
greater than k. Therefore the random variables {Dk}k>o are computable, uniformly 
in k. We now show that, with probability one, {Dk}k>o is the binary expansion of 
X, thus showing that X is itself a computable random variable. 

There are two cases to consider: 

First, conditioned on /i(N) = oo, we have that 0,^ = V^ for all k > 0. In fact, 
X = V when /i(N) = oo, and so the binary expansions match. 

Condition on /i(N) = m and let D denote the computable random real whose 
binary expansion is {Dk}k>o- We must then show that D = X^, a.s. Note that 

m— 1 

[2™X„J = [2'"VJ = Y, 2™"^"'=Vfc = [2^0], (32) 

k=0 

and thus the binary expansions agree for the first m digits. Finally, notice that 
2™+^Xm — 2[2™XmJ = C + U, and so the next binary digit of X^ is C, followed by 
the binary expansion of U, thus agreeing with D for all A; > 0. D 

We now show that P[N|X] is not computable, despite the existence of a Px-almost 
continuous version of P[N|X]. We begin by characterizing the conditional density of 
X given N. 

Lemma 6.2. For each k £ ND {oo}, the distribution of X^ admits a density px,. 
with respect to Lebesgue measure on [0, 1], where, for all k < oo and 'Px-almost all 



\2''^^x\ even; 
[2^'+ixJ odd. 



px.(x)= r ,:..i:, r (33) 



andpx^ix) = 1. 



Proof. We have Xqo = V a.s. and so the constant function taking the value 1 is the 
density of Xqo with respect to Lebesgue measure on [0, 1]. 

Let /c G N. With probabihty one, the integer part of 2^+^Xfc is 2[2^VJ + C while 
the fractional part is U. Therefore, the distribution of 2 "'"^X^ (and hence X^) admits 
a piecewise constant density with respect to Lebesgue measure. 

In particular, [2^^XfcJ = C (mod 2) almost surely and 2[2 VJ is independent of 
C and uniformly distributed on {0, 2, . . . , 2^^ — 2}. Therefore, 

P{[2'+'X,\=e} = 2''^.lf ^7;' (34) 

4, I odd. 
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n=0 



1/2 



n=l 



n-2 



n=3 

n=4 

n^5 



1/4 

1/8 






Figure 1. A visualization of (X, Y), where Y is uniformly dis- 
tributed and N = [— log2 YJ . Regions that appear (at low resolution) 
to be uniform can suddenly be revealed (at higher resolutions) to be 
patterned. Deciding whether the pattern is in fact uniform (or below 
the resolution of this printer/display) is tantamount to solving the 
halting problem, but it is possible to sample from this distribution 
nonetheless. Note that this is not a plot of the density, but instead 
a plot where the darkness of each pixel is proportional to its measure. 

for every £ G {0, 1, . . . , 2^^ — !}• It follows immediately that the density p of 2^^Xk 
with respect to Lebesgue measure on [0, 2'^"'"^] is given by 



p{x) 



[x\ even; 



(35) 



n 



I 3, [xj odd. 

and so the density of X^ is obtained by rescaling: px^{x) = 2^^^ •p{2^^'^x) 

As Xfc admits a density with respect to Lebesgue measure on [0, 1] for all k G 
NU{oo}, it follows that the conditional distribution of X given N admits a conditional 
density Px\u (with respect to Lebesgue measure on [0, 1]) given by 

Px\H{x\n)=PXy,^„){x). (36) 

Each of these densities is positive, bounded, and continuous on the nondyadic re- 
als, and so they can be combined to form a Px-almost continuous version of the 
conditional distribution. 



Lemma 6.3. There is a Vx-alvnost continuous version o/P[N|X]. 
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Proof. By Bayes' rule (Lemma 8.6), the probability kernel k given by 

. „s EneBPx|N(a;|n)-P{N = n} 
'^(^' -^) •= Y^ / I N pjM r for 5 C N (37) 

is a version of the conditional distribution P[N|X]. Moreover, pxiN is positive, 
bounded and continuous on i? x N, where R is the Px-measure one set of nondyadic 
reals in the unit interval. It follows from Lemma 4.8 that n is Px-almost continu- 
ous. D 

Lemma 6.4. Let k be a version o/P[N|X]. For all m, n G N and 'Px-almost all x, 

{2,1,2}, h{n) , h{m) < 00; 



Kix,{m}) ^ 
K{x,{n}) 



{1}, h{n) = h{m) = 00; 

2 3 4 3i 

3' 4' 3' 2J 



{i,|,|,i}, otherwise. 



Proof. Let m,n G N. By (37) and then (36), for Px-almost all x, 

^ r^^_9n^-n Px|N (a:|m) • P{N = m} 
Px\N{x\n) -PIN =n} 

By Lemma 6.3, we have px^{x) = 1 and px,.{x) G {|, |} for all fc < 00 and Px- 
almost all X. The result follows by considering all possible combinations of values 
for each regime of values for h{n) and h{m). D 

Despite the computability of N and X, conditioning on X leads to noncomput abil- 
ity- 
Proposition 6.5. For all k, the conditional probability P[N = A;|X] is not lower 
semicomputable (hence not computable) on any measurable subset R C [0, 1] where 
Px{R) > I 

Proof. Choose n to be the index of some Turing machine M„ that halts (on input 0) , 
i.e., for which h{n) < 00, let m G N, let «; be a version of the conditional distribution, 
let T = Tm,n be as in Lemma 6.4, and let Kcoo and V^o be disjoint c.e. open sets that 
contain {2,1,2} and {|, |, |, |}, respectively. By Lemma 6.4, when h{m) < 00, we 
have Px(''"~^[V<oo]) = 1- Likewise, when h{m) = 00, we have Px(''"~^[^oo]) = 1- 

Suppose that for some k that P[N = A;|X] were lower semicomputable on some set 
R satisfying Px(-R) > 3. Note that the numerator of Equation (37) is computable 
on the nondyadics, uniformly in B = {k}, and the denominator is independent of 
B. Therefore, because P[N = A;|X] = K,{-,{k}) Px-a.e., it follows that P[N = A:|X] 
is lower semicomputable on R, uniformly in k. But then, uniformly in k, we have 
that P[N = A;|X]) is computable on R because {k} is a decidable subset of N and 

P[N = A;|X] = l-E,yfcP[N=i|X] a.s. (38) 

Therefore, r is computable on R, uniformly in m. 
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It follows that there exist c.e. open sets C/<oo and Uoo, uniformly in m, such that 

T~H^<oo]n-R = ?7<ooni? and thus Px(t/<ooni?) = Px(r~^[V<oo]nii), and similarly 
for Voo and Uoo- 

If h{'m) < oo, then Px(t^^[V<oo]) = 1 and thus 

Px(f/<oo) > Px(f/<oo nR) = Px{r-HV<oo] nR) = Px{R) > I (39) 

Similarly, h{m) = oo implies that Px(t^oo) > g- Therefore, at least one of 

Px(f/<oo) > i or Px(^7oo) > i (40) 

holds. 

As V<oo n Ko = 0, we have [/<oo n f/oo n i? = and so Px(^<oo n Uoo) < 1 - f- 
Thus, Px(f/<oo) + Px(^oo) < 1 + Px(t/<oo n C/oo) < 1 + (1 - i) = I and so we have 
that at most one (and hence exactly one) of (i) Px(f^<oo) > 3 and h{m) < 00 or 
(ii) Px{Uoo) > I and h{m) = 00. 

But Px(f^<oo) and Px(f^oo) are c.e. reals, uniformly in m, and so we can com- 
putably distinguish between cases (i) and (ii), and thus decide whether or not 
h{m) < 00, or equivalently, whether or not m €z H, uniformly in m. But H is not 
computable and so we have a contradiction. D 

Corollary 6.6. The conditional distribution P[N|X] is not computable. 

Proof. As {n} is a decidable subset of N, uniformly in n, it follows from Corollary 4.5 
that the conditional probability P[N = n\X] is computable on a measure one set 
when the conditional distribution is computable. By Proposition 6.5, the former is 
not computable, and so the latter is not computable. D 

Note that these proofs show that not only is the conditional distribution P[N|X] 
noncomputable, but in fact it computes the halting set H in the following sense. 
Although we have not defined the notion of an oracle that encodes P[N|X], one 
could make this concept precise using, e.g., infinite strings in the Type-2 Theory 
of Effectivity (TTE). However, despite not having a definition of computability 
from this distribution, we can easily relativize the notion of computability for the 
distribution. In particular, an analysis of the above proof shows that if P[N|X] is 
A-computable for an oracle A C N, then A computes the halting set, i.e., A >t 0'. 

Computable operations map computable points to computable points, and so we 
obtain the following consequence. 

Theorem 6.7. The operation (X,Y) 1— t- P[Y|X] of conditioning a pair of real-valued 
random variables is not computable, even when restricted to pairs for which there 
exists a Px-a/mosi continuous version of the conditional distribution. 

Conditional probabilities are often thought about as generic elements of L^(Px) 
equivalence classes (i.e., functions that are equivalent up to Px-null sets.) However, 
Proposition 6.5 also rules out the computability of conditional probabilities in the 
weaker sense of so-called L^(Px)-computability. 
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Definition 6.8 {L^{fi) computability [PER84]). Let /i be a computable measure 
on [0, 1]. Then a function / E L^ilj) is said to be L^(/i)-coniputable if there is a 
computable sequence of rational polynomials /„ such that 

\\f-fn\\l = j\f-fnW<2-^. (41) 

If we let Si{f,g) = f \f — 9\dfJ. then we can turn L^{fJ-) into a computable metric 
space by quotienting out by the equivalence relation ^i (/, g) = and letting the 
dense T>i consist of those equivalence classes containing a polynomial with rational 
coefficients. A function / is then L^(;u)-computable if and only if its equivalence 
class is a computable point in L^(/i) considered as a computable metric space. 

The following result is an easy consequence of the fact that L^(^)-computable 
functions are computable "in probability". 

Proposition 6.9. For all n G N, the conditional probability P[N = n|X] is not 
L^ (Px) -(computable. 

Proof. Let n € N. By Proposition 6.5, the conditional probability P[N = n|X] is not 
computable on any Borel set R such that Px(-R) > §• On the other hand, a result 
by Hoyrup and Rojas [HR09b, Thm. 3.1.1] implies that a function / is L^(Px)- 
computable only if / is computable on some Px-measure 1 — 2"'" set, uniformly in 
r. Therefore, P[N = n|X] is not L-'^(Px)-computable. D 

In fact, this result can be strengthened to a statement about the noncomputability 
of each conditional probability P[N = n|X] when viewed as (T(X)-measurable random 
variables (Remark 3.3), or, equivalently, (T(X)-measurable Radon-Nikodym deriva- 
tives. (See Kallenberg [Kal02, Chp. 6] for a discussion of the relationship between 
conditional expectation, Radon-Nikodym derivatives and conditional probability.) 
In this form, we tighten a result of Hoyrup and Rojas [HRll, Prop. 3]. 

Proposition 6.10. Let n G N, and then consider the conditional probability 
P[N = n|X] as a a (X) -measurable Radon-Nikodym derivative rjn '. {0,1}°° — )• [0,1] 
ofP{- n {N = n}) with respect to P. Then r]n is not lower semicomputable (and 
hence not computable) on any measurable subset S C {0,1}°° where P(5') > 3. /n 
particular, r]n is not L^ (P) -computable. 

Proof. Let k be a version of the conditional distribution P[N|X], and, for n G N, let 
r]n be defined as above. In particular, note that, for all n € N and P-almost all uj, 

i]n{oj) = K{X{oj),{n}). (42) 

In particular, this implies that, for all n,m gN and P-almost all cu, 

'{|,1,2}, h{n),h{m) < 00; 

{1}, h{n) = h{m) = 00; (43) 



??n(w) 



{i,|,l,|}, otherwise. 



^3' 4' 3' 2. 

Using Equations (37), (42), and (43), one can express t/„ as a ratio whose numerator 
is computable on a P-measure one set, uniformly in n, and whose denominator is 
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positive for P-almost all uj and independent of n. One can then show, along the 
same lines as Proposition 6.5, that ??„ is not lower semiconiputable on any set S 
as defined above, for otherwise we could decide H. This then rules out the L^(P)- 
computability of r/„. D 

It is natural to ask whether this construction can be extended to produce a pair of 
computable random variables whose conditional distribution is noncomputable but 
has an everywhere continuous version. We provide such a strengthening in Section 7. 

Despite these results, many important questions remain: How badly noncom- 
putable is conditioning, even restricted to these continuous settings? In what re- 
stricted settings is conditioning computable? In Section 8, we begin to address the 
latter of these. 

7. Noncomputable Everywhere Continuous Conditional Distributions 

As we saw in Section 5, discontinuity poses a fundamental obstacle to the com- 
putability of conditional probabilities. As such, it is natural to ask whether we 
can construct a pair of random variables (Z, N) that is computable and admits an 
everywhere continuous version of the conditional distribution P[N|Z], which is itself 
nonetheless not computable. In fact, this is possible using a construction similar to 
that of (X, N) in Section 6. 

In particular, if we think of the construction of the feth bit of X as an iterative 
process, we see that there are two distinct stages. During the first stage, which 
occurs so long as k < /i(N), the bits of X simply mimic those of the uniform random 
variable V. Then during the second stage, once k > /i(N), the bits mimic that of 
i(C + U). 

Our construction of Z will differ in the second stage, where the bits of Z will 
instead mimic those of a random variable S specially designed to smooth out the 
rough edges caused by the biased coin C, while still allowing us to encode the 
halting set. In particular, S will be absolutely continuous and will have an infinitely 
differentiable density. 

We now make the construction precise. Let N, U, V and C be as in the first 
construction. We begin by defining several random variables from which we will 
construct S, and then Z. 

Lemma 7.1. There is a random variable F in [0, 1] with the following properties: 

(1) F is computable. 

(2) Pp admits a density pf with respect to Lebesgue measure (on [0,1]) that is 
infinitely differentiable everywhere. 

(3) Pf(0) = | andpF{l) = l. 

d" d" d" d" 

(^) S^^f(O) = ^pf{l) = 0, for alln> 1 (where ^ and ^ are the left and 
right derivatives respectively). 

(See Figure 2 for one such random variable.) Let F be as in Lemma 7.1, and 
independent of all earlier random variables mentioned. Note that F is almost surely 
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Figure 2. (left) The graph of the function defined by f{x) = 
exp{— j3^}, for X G (—1,1), and otherwise, a C°° bump func- 
tion whose derivatives at ±1 are aU 0. (right) A density 
p{y) = I ( l^m + Ij' ^OT y € (0,1), of a random variable satis- 
fying Lemma 7.1, where $(y) = f^^ exp{— ,_ 2 } dx is the integral of 
the bump function. 



nondyadic and so the r-th bit F,. of F is a computable random variable, uniformly 
in r. 

Let D be a computable random variable, independent of all earlier random vari- 
ables mentioned, and uniformly distributed on {0, 1, . . . , 7}. Consider 



1 



F, if D = 0; 

x<4 + (l-F), ifD = 4; (44) 

4C + (D mod 4) + U, otherwise. 

It is clear that S is itself a computable random variable, and straightforward to 
show that 

(i) Ps admits an infinitely differentiable density ps with respect to Lebesgue 
measure on [0, 1]; and 

d" d" 

(a) For ah n > 0, we have 3^Ps(0) = ^Ps(l)- 

(For a visualization of the density ps see Figure 3.) 

We say a real x e [0,1] is valid for Ps if x G (|, ^) U (|, 1). In particular, 
when D {0,4}, then S is valid for Ps. The following are then straightforward 
consequences of the construction of S and the definition of valid points: 

(in) If X is valid for Ps then ps{x) ^ {^,^}. 

(iv) The Ps-measure of points valid for Ps is |. 

Next we define, for every A: € N, the random variables 7.k mimicking the construction 
of Xfc. Specifically, for A; e N, define 

^ _^ L2^VJ + s 
k ■ 2^ ' 



(45) 
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1/8 



1/2 



5/8 



Figure 3. Graph of the density of S, when constructed from F as 
given in Figure 2. 



and let Zf, 



hniyfe^oo Zfc = V a.s. Then the nth bit of Z^ is 



{^k)n 



^n—kj 



n < k; 
n > k 



a.s. 



(46) 



16 



For k < oo, we say that x E [0, 1] is valid for Pz^ if the fractional part of 2 x is 
valid for Ps, and we say that x is valid for Pz^o ^^^ ^^^ ^- Let Ak be the collection of 
X valid for Pzj., and let ^oo = [0, !]• It is straightforward to verify that Pz(Afc) > 
for all k < oo, and 1 for k = cxd. 

It is also straightforward to show from (i) and (ii) above that Pz^. admits an 
infinitely differentiable density pzj. with respect to Lebesgue measure on [0, 1]. 

To complete the construction, we define Z := Z/^^m). The following results are 
analogous to those in the almost continuous construction: 

Lemma 7.2. The random variable Z is computable. 

Lemma 7.3. There is an everywhere continuous version o/P[N|Z]. 

Proof. By construction, the conditional density of Z is everywhere continuous, bounded, 
and positive. The result follows from Lemma 4.8 for i? = [0, 1]. D 

Lemma 7.4. Let k be a version of the conditional distribution P[N|Z]. For all 
m,n gN and Pj-almost all x, if x is valid for P/,, , and Pv,, , then 



Tm,n\^) ■ — -^ 



k{x, {m}) 
K{x,{n}) 



{2,1,2}, h{n) , h{m) < 00; 

£ < {!}, h{n) = h{m) = 00; 

.ii'l'l'li' otherwise. 



Proposition 7.5. For all k, the conditional probability P[N = A;|Z] is not lower 
semicomputable (hence not computable) on any measurable subset R C [0, 1] where 
Pz{R) > i- 
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Proof. Choose n to be the index of a Turing machine M„ that does not halt (on input 
0) so that h{n) = oo, let m G N, let k be a version of the conditional distribution, let 
■'" = Tm,n be defined as in Lemma 7.4, and let y<oo and Foo be disjoint c.e. open sets 
containing the points {|, |, |, |} and {!}, respectively. Notice that all x G [0, 1] are 
vahd for Pz^^„^ = ^z^ and so A^^^) n ^^(m) = Ah{m) ■ Therefore, by Lemma 7.4, 
when h{m) < oo, we have Pz{T^^[y<oo]) > Pz(^?t(m)) ^ jq- On the other hand, 
when h{m) = oo, we have Pz(''"~^[Voo]) = Pz(^oo) = 1- The remainder of the proof 
is similar to that of Proposition 6.5, replacing X with Z, replacing ^ with gf, and 
noting, e.g., that Pz(7"~^[F<oo] ^R) > Je ~ Ji when h{m) < oo. In particular, were 
P[N = k\Z] lower semicomputable on R, we could computably distinguish whether 
or not m €z H, uniformly in m, which is a contradiction. D 

As before, it follows immediately that P[N|Z] is not computable. It is possible to 
carry on the same development, showing the non-computability of the conditional 
probabilities as elements in L^(Px) and -^^^(P) For simplicity, we state the following 
strengthening of Theorem 6.7. 

Theorem 7.6. Let X and Y be computable real-valued random variables. Then 
operation (X, Y) i— )• P[X|Y] of conditioning a pair of real-valued random variables is 
not computable, even when restricted to pairs for which there exists an everywhere 
continuous version of the conditional distribution. 

8. Positive Results 

We now consider situations in which we can compute conditional distributions. 
Probabilistic methods have been widely successful in many settings, and so it is 
important to understand situations in which conditional inference is possible. We 
begin with the setting of discrete random variables. 

8.1. Discrete Random Variables. A very common situation is that in which we 
condition with respect to a random variable taking values in a discrete set. As we 
will see, conditioning is always possible in this setting, as it reduces to the elementary 
notion of conditional probability with respect to single events. 

Lemma 8.1 (Conditioning on a discrete random variable). Let X and Y be com- 
putable random variables in computable metric spaces S and T, respectively, where 
S is a finite or countable and discrete set. Then the conditional distribution P[Y|X] 
is computable, uniformly in X and Y. 

Proof. Let k be a version of the conditional distribution P[Y|X], and define 
S+ := supp(Px) = {x G S : Px{x} > 0}. We have Px(5'+) = 1, and so, by 
Lemma 3.8, k is characterized by the equations 

K(x,-)=P{Ye-|X = x}= p^xLx} ' '^'^^+- ^^^^ 

Let S C T be an Py-almost decidable. Because {x} is decidable, {x} x B is P(x,y)- 
almost decidable, and so the numerator is a computable real, uniformly in x G 5"+ 
and B. Taking B = T, we see that the denominator, and hence the ratio is also 
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computable, uniformly in 2; G 5"+ and B. Thus, by Corollary 2.17, x 1— )■ k{x, •) is 
computable on a Px-measure one set. D 

Note that one could provide an alternative proof using the "rejection sampling" 
method, which is used to define the semantics of conditioning on discrete random 
variables in several probabilistic programming languages. This could, for example, 
be formalized by using least fixed points or unbounded search. 

A related situation is one where a computable random variable X concentrates on 
a finite or countable subset of a computable metric space. In this case. Equation (47) 
is well-defined and characterizes every version of the conditional distribution, but the 
event {X = x}, for x G supp(Px), is not decidable. However, if the countable subset 
is discrete, then each such event is almost decidable. In order to characterize the 
computability of conditioning in this setting, we first define a notion of computability 
for discrete subsets of computable metric spaces. 

Definition 8.2 (Computably discrete set). Let 5 be a computable metric space. 
We say that a (finite or countably infinite) subset D Q S is a computably discrete 
subset when there exists a function / : 5 — )■ N that is computable and injective on 
D. We call / the witness to the computable discreteness of D. 

The following result is an easy consequence of the definition of computably dis- 
crete subsets and the computability of conditional distributions given discrete ob- 
servations. 

Lemma 8.3. Let X and Y be computable random variables in computable metric 
spaces S and T , respectively, let D C S be a computable discrete subset with wit- 
ness f, and assume that X £ D a.s. Then the conditional distribution P[Y|X] is 
computable, uniformly in X, Y and (the witness for) f. 

Proof. Let k be a version of the conditional distribution of P[Y|X] and let kj be a 
version of the conditional distribution of P[Y|/(X)]. Then, for x £ D, 

K{x,B) = Kf{f{x),B), (48) 

and these equations completely characterize k as Px(-D) = Pj(x)(/(-D)) = 1. Note 
that /(X) is a computable random variable in N, and so by Lemma 8.1, Hf is 
computable on f{D), and so k is computable on D. D 

8.2. Continuous and Dominated Settings. The most common way to calculate 
conditional probabilities is to use Bayes' rule, which requires the existence of a 
conditional density. (Within statistics, a probability model is said to be dominated 
when there is a conditional density.) We first recall some basic definitions. 

Definition 8.4 (Density). Let {S,A, u) be a measure space and let / : S" — )• M"*" be 
an z/-integrable function. Then the function ^ on A given by 

^i{A) = f fdu (49) 

J A 
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for ^ G ^ is a (finite) measure on {S,A) and / is called a density of /j, (with 
respect to u). Note that g is a density of /i with respect to i' if and only ii f = g 
u-a.e. 

Definition 8.5 (Conditional density). Let X and Y be random variables in (com- 
plete separable) metric spaces, let kx|y be a version of P[X|Y], and assume that 
there is a measure v £ M{S) and measurable function Px\Y{x\y) : S* x T — t- M+ such 
that Px|y( ■ \y) is 3- density of nxiyiV: " ) with respect to v for Py-almost all y. That 
is, 



Kx\Yiy,A)= Px\Yix\y)'^idx) (50) 

J A 

for measurable sets A Q S and Py-almost all y. Then Px\Yix\y) is called a condi- 
tional density of X given Y (v^ith respect to u). 

Common parametric families of distributions (e.g., exponential families like Gauss- 
ian, Gamma, etc.) admit conditional densities, and in these cases, the well-known 
Bayes' rule gives a formula for expressing the conditional distribution. We give a 
proof of this classic result for completeness. 

Lemma 8.6 (Bayes' rule [Sch95, Thm. 1.13]). Let X and Y be random variables 
as in Definition 3.5, let kxIY ^^ ^ version of the conditional distribution P[X|Y], 
and assume that there exists a conditional density Px\y{x\y)- Then a function k : 
S X Bt ^ [0, 1] satisfying 

, „^ /gPx|Y(3:|j/)PY(rfy) 
J Px|Y(a;|y)PY(rfy) 

for those points x for which the denominator is positive and finite, is a version of 
the conditional distribution P[Y|X]. 

Proof. By Definition 3.5 and Fubini's theorem, for Borel sets ACS and B QT, we 
have that 

P{XeA,YeB}= [ Kx|Y(y, A)PY{dy) (52) 

Jb 

Px\Y{x\y)u{dx) ] Py{dy) (53) 

L / 

Px\Y{x\y)PY{dy)] u{dx). (54) 

B J 

Taking B = T, we have 

Px(^) = ^ (y Px|Y(^|y)PY(dy)) '^(dx). (55) 

Because Px('S') = 1, this implies that the set of points x for which the denominator 
of (51) is infinite has i/-measure zero, and thus Px-measure zero. Taking A to be 
the set of points x for which the denominator is zero, we see that Px(^) = 0. It 



B 
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follows that (51) characterizes k up to a Px-nuU set. Also by (55), we see that the 
denommator is a density of Px with respect to v, and so we have 

KY\x{x,B)Pxidx) = KYix{x,B)i px\Y{x\y)PY{dy)ju{dx). (56) 

Finally, by the definition of KyiX) Equation (54), and the fact that the denominator is 
positive and finite for Px-almost all x, we see that kyix is a version of the conditional 
distribution P[Y I X]. D 

Comparing Bayes' rule (51) to the definition of conditional density (50), we see 
that any conditional density of Y given X (with respect to Py) satisfies 

PY\x{yx) = -J. . I .p ., X , (57 

J Px\Y[x\y)PY[dy) 

for P(x.y)-almost all (x, y). We can now give the proof of Lemma 4.8. 

of Lemma 4-8. Let Ky|x be given by (51), and let i? C T be an open set. By hy- 
pothesis, the map (/> : S — )■ C(r, M"*") given by (p{x) = Px\y{x\ •) is continuous on 
R, while the indicator function 1b is lower semicontinuous. Integration of a lower 
semicontinuous function with respect to a probability measure is a lower semicon- 
tinuous operation, and so the map x i— )• J lB(l){x)dPY is lower semicontinuous on 
R. 

Note that for every x £ R, the function (j)(x) is positive and bounded by hypoth- 
esis. Integration of a bounded continuous function with respect to a probability 
measure is a continuous operation, and so the map x i— )• J </)(j;)dPY is positive and 
continuous on R. Therefore the ratio in (51) is a lower semicontinuous function of 
X £ R for fixed B, completing the proof. D 

Using the following well-known result about integration of computable functions, 
we can study when the conditional distribution characterized by Bayes' rule is com- 
putable. 

Proposition 8.7 ([HR09a, Cor. 4.3.2]). Let S be a com,putahle m,etric space, and 
jjL a computable probability m,easure on S. Let / : S" — )• M^ be a bounded computable 
function. Then J fd/j, is a computable real, uniformly in f. 

Corollary 8.8 (Density and independence). Let U, V, and Y be computable ran- 
dom variables (in computable metric spaces), where Y is independent ofV given U. 
Assume that there exists a conditional density pYi\j{y\u) ofY given U (with respect 
to u) that is bounded and computable. Then the conditional distribution P[(U, V)|Y] 
is computable. 

Proof. Let X = (U, V). Then PY|x(y|(^;^)) = PY|u(yK) is the conditional density of 
Y given X (with respect to v). Therefore, the computability of the integrand and the 
existence of a bound imply, by Proposition 8.7, that P[(U, V)|Y] is computable. D 
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8.3. Conditioning on Noisy Observations. As an immediate consequence of 
Corollary 8.8, we obtain the computability of the following common situation in 
probabilistic modeling: where the observed random variable has been corrupted by 
independent absolutely continuous noise. 

Corollary 8.9 (Independent noise). Let [} he a computable random variable in a 
computable metric space and let V and E be computable random variables in M. 
Define Y = U + E. //Pe is absolutely continuous with a bounded computable density 
Pe o-nd E is independent of U and V then the conditional distribution P[(U, V) | Y] 
is computable. 

Proof. We have that 

p\\\}{y\u) = PE{y - u) (58) 

is the conditional density of Y given U (with respect to Lebesgue measure). The 
result then follows from Corollary 8.8. D 

Pour-El and Richards [PER89, Ch. 1, Thm. 2] show that a twice continuously 
differentiable computable function has a computable derivative (despite the fact 
that Myhill [Myh71] exhibits a computable function [0, 1] — )• M whose derivative 
is continuous, but not computable). Therefore, noise with a sufficiently smooth 
distribution has a computable density, and by Corollary 8.9, a computable random 
variable corrupted by such noise still admits a computable conditional distribution. 

Furthermore, Corollary 8.9 implies that noiseless observations cannot always be 
computably approximated by noisy ones. For example, even though an observation 
corrupted with zero mean Gaussian noise with standard deviation a may recover 
the original condition as o" — t- 0, by our main noncomput ability result (Theorem 6.7) 
one cannot, in general, compute how small a must be in order to bound the error 
introduced by noise. 

This result is also analogous to a classical theorem of information theory. Hartley 
[Har28] and Shannon [Sha49] show that the capacity of a continuous real-valued 
channel without noise is infinite, yet the addition of, e.g., Gaussian noise with e > 
variance causes the channel capacity to drop to a finite amount. The Gaussian noise 
prevents all but a finite amount of information from being encoded in the bits of the 
real number. Similarly, the amount of information in a continuous observation is too 
much in general for a computer to be able to update a probabilistic model. However, 
the addition of noise, as above, is sufficient for making conditioning possible on a 
computer. 

The computability of conditioning with noise, coupled with the noncomput ability 
of conditioning in general, has significant implications for our ability to recover a 
signal when noise is added, and suggests several interesting questions. For example, 
suppose we have a uniformly computable sequence of noise {E„}„gN with absolutely 
continuous, uniformly computable densities such that the magnitude of the densities 
goes to in some sufficiently nice way, and consider Y^ := U + E„. Such a situation 
could arise, e.g., when we have a signal with noise but some way to reduce the noise 
over time. 
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When there is a continuous version of P[(U, V)|Y], we have 

hm„_,ooP[(U,V)|Y„]=P[(U,V)|Y] a.s. (59) 

However, we know that the right side is, in general, noncomputable, despite the fact 
that each term in the limit on the left side is computable. 

This raises several questions, such as: What do bounds on how fast the se- 
quence {P[(U, V)|Y„]}„gN converges to P[(U,V)|Y] tell us about the computability 
of P[(U, V)|Y]? What conditions on the relationship between U and the sequence 
{E„}„gpj will allow us to recover information about P[(U, V)|Y] from individual dis- 
tributions P[(U,V)|Y„]? 

8.4. Other Settings. Freer and Roy [FRIO] show how to compute conditional dis- 
tributions in the setting of exchangeable sequences. A classic result by de Finetti 
shows that exchangeable sequences of random variables are in fact conditionally 
i.i.d. sequences, conditioned on a random measure, often called the directing ran- 
dom measure. Freer and Roy describe how to transform an algorithm for sampling 
an exchangeable sequence into a rule for computing the posterior distribution of the 
directing random measure given observations. The result is a corollary of a com- 
putable version of de Finetti's theorem [FR09], and covers a wide range of common 
scenarios in nonparametric Bayesian statistics (often where no conditional density 
exists) . 
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